Experimental GPU support for Windows #103

mworchel · 2020-01-30T20:40:15Z

This PR enables CUDA compilation for Windows and therefore introduces experimental GPU support. I re-organized parts of the CMakeLists.txt to have an (arguably) more ordered structure. Pybind support is now added by linking against an exported target instead of using the call to pybind11_add_module (which would interfere with usage of cuda_add_library).

Pathtracing does still not work. However, albedo and deferred rendering work fine. So does backpropagation, at least for the GPU mode (see #93).

I didn't test the changes on linux or ios, so that's up for testing.

bjorn3 · 2020-01-30T20:42:20Z

CMakeLists.txt

+    # The "-undefined dynamic_lookup" is a hack for systems with
+    # multiple Python installed. If we link a particular Python version
+    # here, and we import it with a different Python version later.
+    # likely a segmentation fault.


This sentence seems to be missing the beginning. (pre-existing)

Probably a merge error or I was just typing non-sense. Will fix it at some point.

BachiLi · 2020-02-01T18:57:48Z

@mworchel This works fine on my side and I'll merge it. I'm curious about the path tracing issue. What is wrong about it?

mworchel · 2020-02-02T11:55:49Z

Thanks for merging.

Regarding the path tracing, I noticed that whenever I try to do it in GPU mode under Windows, the application will crash (in CPU mode it works fine, however the backprop does not work there). I invested some time and the crash is caused when calling update_active_pixels(https://github.com/BachiLi/redner/blob/master/pathtracer.cpp#L384). The call to thrust::copy_if in this function fails at device synchronization. The issue is potentially caused by an earlier call to the CUDA API but the error only appears at this point because of the sync.

I ran a cuda-memcheck but I think I didn't see any specific errors. I could try to place device sync points after the preceding functions like accumulate_path_contribs or intersect to narrow it further down.

BachiLi · 2020-02-02T18:28:49Z

Interesting. The most suspicious candidate for this is the sample_point_on_light procedure, which performs a binary search over an array, which is allocated on another array, where the array is allocated on CUDA unified memory (no typo here). I suspect that the unified memory works differently under windows and is causing this issue. This might also be the reason why some of the older GPUs caused crash in redner. Not sure about the solution yet if this is the cause.

mworchel · 2020-02-03T12:46:11Z

I added multiple sync points and interestingly, the call to accumulate_path_contribs is the one that fails, causing an unspecified launch failure. I commented out nearly everything in the path_contribs_accumulator only being left with something like

if (rendered_image != nullptr) {
    auto nd = channel_info.num_total_dimensions;
    auto d = channel_info.radiance_dimension;
    rendered_image[nd * pixel_id + d]     += 0.5 /*weight * path_contrib[0]*/;
    rendered_image[nd * pixel_id + d + 1] += 0.5 /*weight * path_contrib[1]*/;
    rendered_image[nd * pixel_id + d + 2] += 0.5 /*weight * path_contrib[2]*/;
}
if (edge_contribs != nullptr) {
    edge_contribs[pixel_id] += sum(weight * 0.5/* path_contrib*/);
}

and with this, the forward pass succeeds. I'll try to comment in stuff one after another and pin the error down some more. Maybe you already have some kind of suspicion? "Luckily" it even fails with a 1x1 image, so I can watch one pixel in isolation.

EDIT: Commenting in this line (https://github.com/BachiLi/redner/blob/master/path_contribution.cpp#L38)

auto bsdf_val = bsdf(material, shading_point, wi, wo, min_rough);

triggers the error. Maybe some out of bounds access on the textures?

EDIT2: Hmm ok, when I comment out lines https://github.com/BachiLi/redner/blob/master/material.h#L404-L441 (specular component), the forward pass succeeds and I get the correct diffuse colors.

EDIT3: When I comment out these lines https://github.com/BachiLi/redner/blob/master/material.h#L435-L440 the whole thing doesn't compile anymore with some template error in a thrust header. Ok, what?

mworchel · 2020-02-03T17:09:57Z

Ok, so I was able to fix the error (currently only trying the forward pass) by factoring the Fresnel term into two distinct terms, so replacing

auto F = specular_reflectance +
    (1.f - specular_reflectance) *
    pow(max(1.f - cos_theta_d, Real(0)), 5.f);

by

Vector3 F1 = specular_reflectance;
Vector3 F2 = (1.f - specular_reflectance) * pow(max(1.0 - cos_theta_d, Real(0)), 5.0);
auto F = F1 + F2;

I'm not sure what the root cause is, but I'd guess it has something to do with accessing references to temporary objects in the overloaded operators. Although the code looks fine to me.

EDIT: The fix is extremely brittle, though. If you move the part pow(max(1.0 - cos_theta_d, Real(0)), 5.0) into its own variable, it won't compile anymore (thrust template errors as above) and if you replace 1.0 and 5.0 with 1.f and 5.f you get the CUDA runtime errors.

BachiLi · 2020-02-03T18:46:51Z

I am not using any arcane expression template magic here, so I don't think it's the reference problem you mentioned. This is weird to a point that I start to suspect it's a compiler bug in nvcc.

mworchel · 2020-02-03T21:34:52Z

It definitely looks like something is very wrong. Especially the compiler error when commenting out certain lines is weird. I wouldn't be surprised if might be on nvcc's side, yes. The only stuff in the code that looks remotely suspicious as I'm not sure it's guaranteed to work by the cpp standard is the operator[] of TVectorN and TFrame (I mean the *(&x + i); stuff). It assumes the members are laid out linearly without padding but I think a compiler is allowed to do weird stuff like aligning the members to certain byte boundaries where this code might break?! At this point I'm just desperately trying to come up with some logical explanation..

mworchel · 2020-02-04T15:15:15Z

Ok, the compiler errors might be caused by some incremental build stuff going wrong or some other voodoo. It's not as consistently reproducible as I thought.

However, I was able to fix the forward pathtracing pass with even less changes. Just replacing

// Schlick's approximation
auto F = [...] * pow(max(1.f - cos_theta_d, Real(0)), 5.f);

by

// Schlick's approximation
auto F = [...] * pow(max(1.f - cos_theta_d, Real(0)), 5.0);

so, replacing 5.f by 5.0. I really don't know which pow function is used in the above case, as it's called with arguments (double, float) but that seems to cause the issue. The latter case calls pow(double, double) and everything is fine.

I cannot believe this is the fix but making sure the (double, double) version is called everywhere, even allows to use path tracing with backpropagation now!

I'll test it on one more machine and will send a PR later.

mworchel · 2020-06-16T19:20:15Z

This is weird to a point that I start to suspect it's a compiler bug in nvcc.

@BachiLi, I was just reminded of this PR and the discussion when I tried to compile Redner today and faced the same weird compilation error again. After all, you were right! It's a bug in NVCC which should be fixed in CUDA 11:
NVIDIA/thrust#1090

Leaving this here for documentation purposes.

Support GPU target for Win32

11ba61d

bjorn3 reviewed Jan 30, 2020

View reviewed changes

BachiLi merged commit 59c1bb5 into BachiLi:master Feb 1, 2020

mworchel deleted the windows-gpu-support branch February 2, 2020 12:24

mworchel mentioned this pull request Feb 4, 2020

Fix GPU pathtracing under Windows #105

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental GPU support for Windows #103

Experimental GPU support for Windows #103

mworchel commented Jan 30, 2020

bjorn3 Jan 30, 2020 •

edited

Loading

BachiLi Feb 1, 2020

BachiLi commented Feb 1, 2020

mworchel commented Feb 2, 2020

BachiLi commented Feb 2, 2020 •

edited

Loading

mworchel commented Feb 3, 2020 •

edited

Loading

mworchel commented Feb 3, 2020 •

edited

Loading

BachiLi commented Feb 3, 2020 •

edited

Loading

mworchel commented Feb 3, 2020

mworchel commented Feb 4, 2020 •

edited

Loading

mworchel commented Jun 16, 2020

Experimental GPU support for Windows #103

Experimental GPU support for Windows #103

Conversation

mworchel commented Jan 30, 2020

bjorn3 Jan 30, 2020 • edited Loading

Choose a reason for hiding this comment

BachiLi Feb 1, 2020

Choose a reason for hiding this comment

BachiLi commented Feb 1, 2020

mworchel commented Feb 2, 2020

BachiLi commented Feb 2, 2020 • edited Loading

mworchel commented Feb 3, 2020 • edited Loading

mworchel commented Feb 3, 2020 • edited Loading

BachiLi commented Feb 3, 2020 • edited Loading

mworchel commented Feb 3, 2020

mworchel commented Feb 4, 2020 • edited Loading

mworchel commented Jun 16, 2020

bjorn3 Jan 30, 2020 •

edited

Loading

BachiLi commented Feb 2, 2020 •

edited

Loading

mworchel commented Feb 3, 2020 •

edited

Loading

mworchel commented Feb 3, 2020 •

edited

Loading

BachiLi commented Feb 3, 2020 •

edited

Loading

mworchel commented Feb 4, 2020 •

edited

Loading